160 research outputs found

    Large-scale dimensionality reduction using perturbation theory and singular vectors

    Get PDF
    Massive volumes of high-dimensional data have become pervasive, with the number of features significantly exceeding the number of samples in many applications. This has resulted in a bottleneck for data mining applications and amplified the computational burden of machine learning algorithms that perform classification or pattern recognition. Dimensionality reduction can handle this problem in two ways, i.e. feature selection (FS) and feature extraction. In this thesis, we focus on FS, because, in many applications like bioinformatics, the domain experts need to validate a set of original features to corroborate the hypothesis of the prediction models. In processing the high-dimensional data, FS mainly involves detecting a limited number of important features among tens/hundreds of thousands of irrelevant and redundant features. We start with filtering the irrelevant features using our proposed Sparse Least Squares (SLS) method, where a score is assigned to each feature, and the low-scoring features are removed using a soft threshold. To demonstrate the effectiveness of SLS, we used it to augment the well-known FS methods, thereby achieving substantially reduced running times while improving or at least maintaining the prediction accuracy of the models. We developed a linear FS method (DRPT) which, upon data reduction by SLS, clusters the reduced data using the perturbation theory to detect correlations between the remaining features. Important features are ultimately selected from each cluster, discarding the redundant features. To extend the clustering applicability in grouping the redundant features, we proposed a new Singular Vectors FS (SVFS) method that is capable of both removing the irrelevant features and effectively clustering the remaining features. As such, the features in each cluster solely exhibit inner correlations with each other. The independently selected important features from different clusters comprise the final rank. Devising thresholds for filtering irrelevant and redundant features has facilitated the adaptability of our model to the particular needs of various applications. A comprehensive evaluation based on benchmark biological and image datasets shows the superiority of our proposed methods compared to the state-of-the-art FS methods in terms of classification accuracy, running time, and memory usage

    Considerations for health care institutions training large language models on electronic health records

    Full text link
    Large language models (LLMs) like ChatGPT have excited scientists across fields; in medicine, one source of excitement is the potential applications of LLMs trained on electronic health record (EHR) data. But there are tough questions we must first answer if health care institutions are interested in having LLMs trained on their own data; should they train an LLM from scratch or fine-tune it from an open-source model? For healthcare institutions with a predefined budget, what are the biggest LLMs they can afford? In this study, we take steps towards answering these questions with an analysis on dataset sizes, model sizes, and costs for LLM training using EHR data. This analysis provides a framework for thinking about these questions in terms of data scale, compute scale, and training budgets

    Overview of the Problem List Summarization (ProbSum) 2023 Shared Task on Summarizing Patients' Active Diagnoses and Problems from Electronic Health Record Progress Notes

    Full text link
    The BioNLP Workshop 2023 initiated the launch of a shared task on Problem List Summarization (ProbSum) in January 2023. The aim of this shared task is to attract future research efforts in building NLP models for real-world diagnostic decision support applications, where a system generating relevant and accurate diagnoses will augment the healthcare providers decision-making process and improve the quality of care for patients. The goal for participants is to develop models that generated a list of diagnoses and problems using input from the daily care notes collected from the hospitalization of critically ill patients. Eight teams submitted their final systems to the shared task leaderboard. In this paper, we describe the tasks, datasets, evaluation metrics, and baseline systems. Additionally, the techniques and results of the evaluation of the different approaches tried by the participating teams are summarized.Comment: To appear in the Proceedings of the 5th BioNLP Workshop at AC

    Immune-Based Pathogenesis of Sulfur Mustard; Much Still Need to Be Done!

    Get PDF
    EDITORIA

    Progress Note Understanding -- Assessment and Plan Reasoning: Overview of the 2022 N2C2 Track 3 Shared Task

    Full text link
    Daily progress notes are common types in the electronic health record (EHR) where healthcare providers document the patient's daily progress and treatment plans. The EHR is designed to document all the care provided to patients, but it also enables note bloat with extraneous information that distracts from the diagnoses and treatment plans. Applications of natural language processing (NLP) in the EHR is a growing field with the majority of methods in information extraction. Few tasks use NLP methods for downstream diagnostic decision support. We introduced the 2022 National NLP Clinical Challenge (N2C2) Track 3: Progress Note Understanding - Assessment and Plan Reasoning as one step towards a new suite of tasks. The Assessment and Plan Reasoning task focuses on the most critical components of progress notes, Assessment and Plan subsections where health problems and diagnoses are contained. The goal of the task was to develop and evaluate NLP systems that automatically predict causal relations between the overall status of the patient contained in the Assessment section and its relation to each component of the Plan section which contains the diagnoses and treatment plans. The goal of the task was to identify and prioritize diagnoses as the first steps in diagnostic decision support to find the most relevant information in long documents like daily progress notes. We present the results of 2022 n2c2 Track 3 and provide a description of the data, evaluation, participation and system performance.Comment: To appear in Journal of Biomedical Informatic

    Multi-Task Training with In-Domain Language Models for Diagnostic Reasoning

    Full text link
    Generative artificial intelligence (AI) is a promising direction for augmenting clinical diagnostic decision support and reducing diagnostic errors, a leading contributor to medical errors. To further the development of clinical AI systems, the Diagnostic Reasoning Benchmark (DR.BENCH) was introduced as a comprehensive generative AI framework, comprised of six tasks representing key components in clinical reasoning. We present a comparative analysis of in-domain versus out-of-domain language models as well as multi-task versus single task training with a focus on the problem summarization task in DR.BENCH (Gao et al., 2023). We demonstrate that a multi-task, clinically trained language model outperforms its general domain counterpart by a large margin, establishing a new state-of-the-art performance, with a ROUGE-L score of 28.55. This research underscores the value of domain-specific training for optimizing clinical diagnostic reasoning tasks.Comment: Accepted to the Proceedings of the 5th Clinical NLP Workshop at AC

    Teratogenic effects of carbamazepine on embryonic eye development in pregnant mice

    No full text
    Background: Carbamazepine is an antiepileptic drug used widely for the treatment of epileptic seizures and neuropathic pain. Several malformations in humans, mainly neural tube defects, have been reported as a consequence of its use during pregnancy. The association between maternal use of carbamazepine and congenital eye malformations is not very well understood. Objective: The purpose of this study was to examine this association after intraperitoneal injection of carbamazepine during the period of organogenesis in mice. Methods: Balb/c timed-pregnant mice were divided into 4 experimental and control groups. Two experimental groups received daily intraperitoneal injections of 15mg/kg (group I) or 30mg/kg (group II) of carbamazepine on gestational days 6 to 15. Two control groups received normal saline or Tween 20 (polysorbate 20). Dams underwent Cesarean section on gestational day 18 and embryos were harvested. External examination for eye malformations, routine histological processing of malformed fetuses to study eye morphology, and skeletal staining were performed. Results: The mean weight and crown-rump of the fetuses in both experimental groups were significantly reduced compared with those of the control groups. Various malformations were detected such as brachygnathia, calvarial deformity, vertebral deformity, short tail, and brachydactyly. Premature opening of one or both eyes with mild to severe exophthalmos occurred in the 2 experimental groups. Deformed lens, retinal folds with undeveloped layers, and corneal folds with absence of surface epithelium were detected in both experimental groups. Conclusions: This study, to the best of our knowledge, showed for the first time that intraperitoneal administration of carbamazepine at clinically comparable doses during organogenesis can induce several eye malformations in mice. The implication of these results needs to be considered when carbamazepine is administered during human pregnancy. © 2010 Informa UK Ltd

    Robust optimization of train scheduling with consideration of response actions to primary and secondary risks

    Get PDF
    Nowadays, with the rapid development of rail transportation systems, passenger demand and the possibility of the risks occurring in this industry have increased. These conditions cause uncertainty in passenger demand and the development of adverse impacts as a result of risks, which put the assurance of precise planning in jeopardy. To deal with uncertainty and lessen negative impacts, robust optimization of the train scheduling problem in the presence of risks is crucial. A two-stage mixed integer programming model is suggested in this study. In the first stage, the objective of the nominal train scheduling problem is to minimize the total travel time function and optimally determine the decision variables of the train timetables and the number of train stops. A robust optimization model is developed in the second stage with the aim of minimizing unsatisfied demand and reducing passenger dissatisfaction. Additionally, programming is carried out and the set of optimal risk response actions is identified in the proposed approach for the presence of primary and secondary risks in the train scheduling problem. A real-world example is provided to demonstrate the model's effectiveness and to compare the developed models. The results demonstrate that secondary risk plays a significant role in the process of optimal response actions selection. Furthermore, in the face of uncertainty, robust solutions can significantly and effectively minimize unsatisfied demand by a slightly rise in the travel time and the number of stops obtained from the nominal problem

    DR.BENCH: Diagnostic Reasoning Benchmark for Clinical Natural Language Processing

    Full text link
    The meaningful use of electronic health records (EHR) continues to progress in the digital era with clinical decision support systems augmented by artificial intelligence. A priority in improving provider experience is to overcome information overload and reduce the cognitive burden so fewer medical errors and cognitive biases are introduced during patient care. One major type of medical error is diagnostic error due to systematic or predictable errors in judgment that rely on heuristics. The potential for clinical natural language processing (cNLP) to model diagnostic reasoning in humans with forward reasoning from data to diagnosis and potentially reduce the cognitive burden and medical error has not been investigated. Existing tasks to advance the science in cNLP have largely focused on information extraction and named entity recognition through classification tasks. We introduce a novel suite of tasks coined as Diagnostic Reasoning Benchmarks, DR.BENCH, as a new benchmark for developing and evaluating cNLP models with clinical diagnostic reasoning ability. The suite includes six tasks from ten publicly available datasets addressing clinical text understanding, medical knowledge reasoning, and diagnosis generation. DR.BENCH is the first clinical suite of tasks designed to be a natural language generation framework to evaluate pre-trained language models. Experiments with state-of-the-art pre-trained generative language models using large general domain models and models that were continually trained on a medical corpus demonstrate opportunities for improvement when evaluated in DR. BENCH. We share DR. BENCH as a publicly available GitLab repository with a systematic approach to load and evaluate models for the cNLP community.Comment: Under revie
    • …
    corecore